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Summary 

Relaxases are proteins responsible for the transfer of 
plasmid and chromosomal DN A from one bacterium to 
another during conjugation. They covalently react with 
a specific phosphodiester bond within DNA origin of 
transfer sequences, forming a nucleo-protein complex 
which is subsequently recruited for transport by a 
plasmid-encoded type IV secretion system. In previ- 
ous work we identified the targeting translocation 
signals presented by the conjugative relaxase Tral of 
plasmid R1 . Here we report the structure of Tral trans- 
location signal TSA. In contrast to known transloca- 
tion signals we show that TSA is an independent 
folding unit and thus forms a bona fide structural 
domain. This domain can be further divided into three 
subdomains with striking structural homology with 
helicase subdomains of the SF1 B family. We also show 
that TSA is part of a larger vestigial helicase domain 
which has lost its helicase activity but not its single- 
stranded DNA binding capability. Finally, we further 
delineate the binding site responsible for transloca- 
tion activity of TSA by targeting single residues for 
mutations. Overall, this study provides the first evi- 
dence that translocation signals can be part of larger 
structural scaffolds, overlapping with translocation- 
independent activities. 

Introduction 

Type IV secretion systems (T4SS) are protein complexes 
spanning bacterial cell membranes (Alvarez-Martinez 
and Christie, 2009; Wallden etal., 2010; Zechner etal.. 
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2012). They are used to transport biomolecules such as 
proteins, nucleic acids and nucleoprotein complexes 
across the cell envelope. They can be divided into three 
subclasses: (i) the effector protein translocation T4SS 
which are responsible for the delivery of effector proteins 
into the cytoplasm of eukaryotic cells (Terradot and 
Waksman, 2011), (ii) the conjugative systems which 
transfer DNA or nucleoprotein complexes from a donor to 
a recipient strain in a cell contact-dependent manner 
(Fronzes etal., 2009), and finally (iii) T4SS which release 
DNA into or mediate uptake from the extracellular milieu 
(Hamilton etal., 2005). T4SS have broad clinical signifi- 
cance not only as virulence factors but also as a major 
vehicle for the dissemination of antibiotic resistance 
genes in clinical and natural environments. 

T4SS in Gram-negative bacteria are typically composed 
of 12 proteins termed VirB1-11 and VirD4 based on the 
prototypical Agrobacterium tumefaciens T-DNA delivery 
system. The VirB proteins form a large macromolecular 
translocation channel embedded in both the inner and 
outer membranes (Wallden etal., 2010). T4SS substrates 
are thought to be recruited to the VirB machinery by the 
VirD4 ATPase, a protein which couples substrate recruit- 
ment to secretion and therefore commonly referred to as 
the type IV coupling protein (T4CP) (Cabezon etal., 1997; 
Schroder and Lanka, 2003; Alvarez-Martinez and Christie, 
2009). T4CPs mediate multiple protein-protein interac- 
tions with cytoplasmic and inner membrane components of 
the secretion system. ATPase activity is associated with 
the release and unfolding of complexes between sub- 
strates and specific chaperones and is required to energize 
the secretion process. 

Conjugation systems are the largest and most widely 
distributed of the T4SS subtypes. The general mechanism 
of nucleoprotein transfer by these systems is well charac- 
terized (de la Cruz et al., 201 0). Multiple proteins assemble 
on the plasmid origin of transfer (or/7) to form the relaxo- 
some. This complex prepares the single strand of plasmid 
DNA destined for transfer (T-strand) via the nicking- 
closing activity of the relaxase enzyme. Initiation of transfer 
requires cleavage at a specific position nic, within orlT. The 
reaction is mediated by a tyrosine residue of the enzyme so 
that a covalent protein-DNA adduct is formed. This nucleo- 
protein complex is specifically recognized as a substrate 
by the T4CP and actively pumped through the transport 
apparatus. Once in the recipient the relaxase-T strand 
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Fig. 1 . Crystal structure of TSA. 

A. Schematic diagram of the domain structure of Tral. The N-terminal relaxase (light blue), the two translocation signal domains (TSA and 
TSB), and the C-terminal domain (dark blue) are indicated, as well as the ssDNA and helicase domains (orange). The domain involved in 
T4SS channel activation via TraD-relaxosome interaction in shown in dark green. Boundary residues for each domain except the RecD 
domains (the boundaries of which are still unclear) are reported above the diagram. 

B. Purification of TSA. Two bands were observed (left panel), one full-length (TSA) and the other resulting from C-terminal degradation 
(TSAAC). Only TSAAC crystallized. During crystallization, TSAAC underwent further proteolytic degradation at the N-terminus, resulting in a 
band shown at right (TSAcryst). 

C. Topology diagram of TSA. TSA contains three domains (here represented in green, orange and blue) which, due to their structural similarity 
with domains 2A and 2B of SF1 helicases, are termed '2A', '2B' and '2B-like' (see Fig. 2). This diagram illustrates the overall organization of 
the sequence and domain structure as well as secondary structure composition. 

D. Crystal structure of TSA. The three structural domains are in ribbon representation and are colour-coded as in (C). 

E. Structural homology between the 2B and 2B-like domain. The middle and C-terminal domains of TSA were superimposed. The two 
domains are in ribbon representation in the same orientation as domain 2B in (D). This figure is available in colour online at 
wileyonlinelibrary.com. 



intermediate restores tine original circular plasmid mol- 
ecule by reversion of tine strand transfer reaction. 

In tfie paradigm F and related R1 plasmid systems tlie 
1 92 l<DA protein Tral is tlie substrate of the T4SS encoded 
by the fra genes (Everett and Willetts, 1980; Reygersef a/., 
1991; Lang etal., 2010; Dostal etal., 2011). This bifunc- 
tional protein includes both the relaxase activity and a 
helicase that assists unwinding of theT-strand. The relaxo- 
some contains the plasmid encoded proteins TraM and 
TraY, and the host genome-encoded protein IHF. All of 



these factors assist relaxase in nic-cleavage and TraM and 
IHF additionally stimulate the helicase (Nelson et a!., 1 995; 
Kupelwieser etal., 1998; Csitkovits and Zechner, 2003; 
Sut etal., 2009). Although an additional role for these 
accessory proteins is not entirely clear, TralVi is thought to 
recruit the relaxosome to the T4SS via interaction with the 
T4CPTraD (Lu etal., 2008; Wong etal., 2011; 2012). Tral 
contains four domains (Fig. 1A): a c. 300 residue relaxase 
domain, followed by a single-stranded DNA-binding 
domain (extending to residue 822) and a helicase domain 
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(the boundaries of this domain are still undefined but the 
domain spans approximately a region encompassing resi- 
dues 980-1500), and a c. 200 residues domain at the 
C-terminus (Dostal and Schildbach, 2010; Lang etal., 
2011). Co-ordinated regulation of ssDNA binding between 
the relaxase domain, the ssDNA-binding domain, and the 
helicase domain is thought to provide a major control point 
for transfer initiation; once cleaved and covalently attached 
to the relaxase domain, the plasmid DNA is transferred to 
the helicase-associated site to activate ATP-dependent 
unwinding. Interaction between Tral and the T4CP TraD is 
also key to controlling initiation: TraD indeed stimulates the 
relaxase and helicase activities (IVIihajlovic et al., 2009; Sut 
etal., 2009). IVIoreover evidence is emerging that a 
complex of Tral bound simultaneously to nic and to the 
T4CP has a key function in activating the opening of the 
T4SS channel (Lang etal., 2011). To study this mecha- 
nism, Lang etal. used group 1 RNA phage which utilize 
conjugative pili as receptors. The phage genome of ssRNA 
with a protein attached to the 3' end gain access to the cell 
interior by a process not yet defined but requiring the 
channel subunits and the T4CP. Docking of a relaxosome 
substrate to the transport apparatus is necessary for phage 
nucleoprotein penetration of the host cell. A minimal func- 
tional domain termed 'the activation domain' of Tral com- 
prising the N-terminal 992 residues including a catalytically 
active relaxase is sufficient to enable nucleoprotein uptake 
(Fig. 1A; Lang etal., 2011). 

Two different mechanisms for the recruitment of 
plasmid-loaded relaxosomes to T4CPs have been pro- 
posed. In F-like systems TralVI binds to or/Tand has been 
described to bind to the C-terminal domain of Tral 
(Ragonese etal., 2007); thus, by mediating binding to 
TraD, Tral and the DNA, TraM might mediate recruitment 
of the relaxosome to the coupling protein. As an alterna- 
tive mechanism, direct binding of relaxases to T4CPs has 
been reported in some systems (Szpirer etal., 2000; 
Schroder etal., 2002; Cascales and Christie, 2004), but 
this has not been observed with R1 proteins (A. Redzej, 
unpublished). 

The features of a substrate protein that specify it for 
secretion have been analysed in a few systems. In some 
cases short, C-terminal, signal sequences of positively 
charged or hydrophobic residues serve as export signals 
for the proteins that contain them (Nagai etal., 2005; 
Vergunst etal., 2005). A second group, which includes 
Tral, have larger, internally positioned signals. These 
regions of Tral, called translocation signals (TS) A (530- 
816) and B (1255-1564), target the relaxosome to the 
secretion machinery (Lang et al., 2010). Although each TS 
alone can confer substrate transport specificity these 
regions share nearly no primary sequence similarity except 
for a small consensus motif similarly present in the TS of 
the R1162 relaxase MobA (Parker and IVIeyer, 2007). Tral 



TSs are widely conserved within the MOBf and MOBq 
families of relaxases (Lang etal., 2010). 

In this study we report the crystal structure of the TSA 
domain of the Tral protein from the R1 plasmid. We show 
that TSA is composed of three domains, each displaying 
structural homologies with SF1 B helicase family domains. 
Based on the TSA structure, site-directed mutagenesis 
was performed and the influence of the mutations on 
relaxase translocation by the T4S system was evaluated, 
thereby identifying the putative recognition interface 
within TSA that is involved in specific substrate recogni- 
tion and recruitment to the T4S machinery. 

Results 

The DNA encoding the TSA (530-81 6) region of Tral of the 
R1 plasmid system (Fig. 1A) was cloned in an expression 
vector downstream of a sequence encoding a Hise-tag and 
an Enterokinase cleavage site. The resulting N-terminally 
tagged protein was expressed in Escherichia coll. After 
Ni-affinity chromatography, two fragments were observed 
on SDS-PAGE, one corresponding to TSA's expected size 
(33.7 kDa) and the other shorter at the C-terminus by c. 
2-3 kDa, TSAAC [this shorter fragment still contained its 
N-terminal Hise-tag as demonstrated using Western blot 
analysis and anti-His antibodies (results not shown)]. 
These fragments could be separated by hydrophobic inter- 
action chromatography (Fig. IB). Both fragments were 
subjected to crystallization trials; however, only the smaller 
fragment crystallized. Analysis of the protein crystals on an 
SDS-PAGE indicated that, in fact, an even smaller frag- 
ment had crystallized, TSAcryst: N-terminal sequencing of 
the crystallized fragment showed the fragment to start at 
residue 570 (Fig. IB). We have shown that removal of 
residues 530-568 lowers only moderately the relative effi- 
ciency of translocation, indicating that the crystallized frag- 
ment maintains functional integrity (Lang etal., 2010). 
No attempt was made to characterize the C-terminal 
sequence of the crystallized fragment. This fragment of 
TSA crystallized in the space group P41212 with one 
molecule in the asymmetric unit. The structure was solved 
using the single wavelength anomalous dispersion 
phasing method (Rice etal., 2000) from a SelenoMethio- 
nine (SeMet)-substituted variant of TSA (TSA contains six 
methionines), and refined to a final resolution of 1.85 A 
(Table SI). 

A model was built between residues 575 and 786. No 
electron density was observed for sequences on either 
side of these boundaries, indicating structural disorder 
beyond these sequences. The crystal structure of the TSA 
domain of the R1 plasmid is composed of three structural 
domains named '2A, 2B and 2B-like' for reasons that 
are explained below. The 2A domain contains both the 
N-terminal and C-terminal ends of the TSAcryst and is 
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Fig. 2. Structural homology between TSA and RecD hellcase. 

A. Domain structure of RecD. Ttie structure of a RecD homologue (RecD2 from Deinococcus radiodurans; PDB ID PD 1274) is shown In 
ribbon representation and In two orientations, 120° apart along the vertical axis. Domains 1A, 2A and 2B of RecD2 are colour-coded yellow, 
green and orange respectively. 

B. Superposition of the structure of TSA onto the structure of RecD2. RecD2 Is In the same orientation as in (A) but Is colour-coded In grey. 
TSA Is In the same orientation and same colour-coding as In Fig. 1D. Orange and green rectangles Indicate the regions, the details of which 
are shown In (C). 

C. Zoom-In on the superimposed parts. The regions within the orange and green rectangles In (B) are zoomed In to provide a visual 
Illustration of the structural superposition between equivalent domains in TSA and RecD2. As can be seen, the 2A and 2B domains of TSA 
superimpose well onto parts of the 2A and 2B domains of RecD2. However, the 2B-llke domain Is shown not to be a part of the hellcase 
domain. This figure is available In colour online at wlleyonllnellbrary.com. 



formed by two a-helices and two p strands (Fig. 1 C and D, 
green). A second and third domains, 2B and 2B-lil<e, are 
structurally similar (Fig. 1 E) and tlieir fold belongs to the 
SH3-lil<e fold family (orange and blue in Fig. 1C and D 
respectively). The 2B domain is composed of six (3 strands 
(p2, P8, p9, p10, p11, P12) in an anti-parallel arrangement 
forming a central hydrophobic core. The 2B-lil<e domain is 
composed of four p strands (P3, P4, ps, P6), two short 
helices (C and D) and shares a strand (p7) with the 2B 
domain (Fig. 1C). Domain 2B is made of sequences 
inserted within domain 2A, while domain 2B-lil<e is made of 
sequences inserted within domain 2B (Fig. 1C). 

A structural homology search using the DALI server 
(Holm and Rosenstrom, 2010) was performed which 
resulted in the identification of domain 2 of SF1 B helicases 
as TSA's closest structural homologue: the RecD subunit 
of the RecBCD hellcase ranl<ed first with a Z-score of 8.9 (a 
highly significant score), followed by other SF1B hellcase 
structures. SF1B hellcase structures are typically com- 
posed of two domains, 1 and 2 (Subramanya etal., 1996; 
Korolev et al., 1 997), each often split into two, subdomains 



A and B, with subdomain B composed of a sequence 
inserted in the middle of the sequence of subdomain A. 
This domain structure is illustrated in Fig. 2A using the 
structure of RecD2 from Deinococcus radiodurans as a 
typical example of a hellcase structure. In this figure, three 
of the four subdomains, 1 A, 2A and 2B, are colour-coded in 
yellow, green and orange respectively; domain IB is not 
coloured as it is very small in RecD2. The structural super- 
position of TSA and RecD2 resulting from the DALI search 
is shown in Fig. 2B. As can be seen, two domains of TSA 
superimpose well with parts of subdomains 2A and 2B of 
RecD2 and, accordingly, are named TSA 2A and 2B 
domains (in green and orange in Figs 1 D and 2B; details in 
Fig. 2C). The root-means-square deviation (rmsd) in Ca 
atoms between TSA and RecD2 equivalent domains 2A 
and 2B is 2.7 A. The third TSA domain (in blue in Fig. 1 D) 
superimposes well with the 2B domain of TSA (rmsd of 
2.4 A) and thus is named '2B-lil<e' (Fig. IE). Thus, due to 
the high structural homology between TSA and SF1B 
hellcase subdomains, we can conclude that TSA is part of 
a hellcase structure, its two first domains, 2A and 2B 
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Fig. 3. Mapping of investigated surface residues onto the structure of TSA. 

A. IVIapping of the surface residues mutated in this study. Residues are shown in stick representation. Colour-coding of residues is as follows: 
red, pink, yellow and blue depending on whether mutations at these locations impairs TSA translocation function significantly (red), mildly 
(pink), not at all (yellow) or stimulate TSA translocation function (blue). 

B. Structural alignment of TSA and RecD2 showing the location of the surface residues in the context of the full-length helicase domain. 

C. Superposition of the structures of TSA (colour-coded as in the other panels) and of the 381-569 structure (Wright etal., 2012; in yellow) 
onto the RecD2 structure (in grey). Three domains of Tral ssDNA-binding domain have homologies with three domains of the helicase fold 
(1A, 2A and 2B), suggesting that the ssDNA-binding domain of Tral is a vestigial helicase that has retained ssDNA-binding activity and has 
also evolved to function as a TS. This figure is available in colour online at wileyonlinelibrary.com. 



matching parts of the structure of the 2Aand 2B domains of 
SF1 B helicases. Its third domain, 2B-like, is not part of the 
helicase fold but instead protrudes out as a separate 
domain which, in theTSA-RecD2 superposition presented 
in Fig. 2B, does not clash with any of the other domains of 
RecD2. Thus the 2B-like domain of Tral constitutes a 
previously uncharacterized addition to the helicase fold, 
which here serves as a signal recognition surface linked to 
substrate transport. 

We next attempted to identify which surfaces in TSA 
likely mediate substrate translocation. 

The TSA region of Tral from R1 plasmid differs from that 
of the F plasmid by only two residues, S757 and H626 in 
R1 , substituted for T757 and L626 in F. Despite this minor 
variation, the T4SS expressed by these two closely related 



plasmids can discriminate between the two Tral TSA 
sequences and selectively translocate only the cognate 
fragment. Exchanging residue 626 of R1 Tral for its coun- 
terpart in F was shown to change the substrate selection 
fidelity from R1 to F for TSA translocation (Lang etal., 
2010). The crystal structure shows that these residues are 
located within the 2B domain and on the concave side of 
the arch-like TSA structure (Fig. 3A). No plasmid-specific 
discrimination was observed for TSA translocation due to 
residue 757 (Lang etal., 2010), so this position may lay 
outside the crucial recognition region. To define this region 
in more detail we carried out Ala and Asn scans on residues 
T746, Q736, S739 and D714 (see location of these resi- 
dues in Fig. 3A). We also targeted residue 717 and 593, 
with the following rationale in mind: R717 is conserved in 
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TSA, TSB, and also MobA of plasmid R1 1 62 and is part of 
a small consensus motif; substitution of R717 by Gin in 
TSB was shown to affect substrate translocation (Lang 
etal., 2010); a nearby insertion (LDR) in MobA also 
blocked translocation (Parker and Meyer, 2007), thus, we 
mutated R717 to Gin in TSA. Concerning residue 593 (an 
Ala in R1 and F Tral proteins), insertion of a 31 residues 
sequence after this residue in the Tral relaxase encoded by 
the F plasmid eliminated detectable translocation of TSA 
(Haft etal., 2006; Lang etal., 2010); so we also sought to 
evaluate the effect of mutating this residue to a more bulky 
Val side-chain. Note that both R717 and A593 are also in 
the 626/757 region (see oval in Fig. 38). 

The different site-specific mutants were generated and 
tested in the CRAFT assay, which is based on the fusion of 
a Cre reporter enzyme to the protein of interest. Secretion 
of the fusion protein is monitored via Cre-mediated recom- 
bination of a reporter cassette in recipient cells. Even 
though the TSA region is the minimal fragment sufficient for 
protein translocation to the recipient strain during conjuga- 
tion (Lang et al., 201 0; translocation frequency -2x1 0"^), 
site-specific mutations were introduced into the Trali_992 
construct (translocation frequency ~ 1 x lO"'*) to improve 
the readout of the assay and to enable comparisons with 
another functional test requiring productive Tral binding 
interactions as described below. The expression and 
stability of all the mutant variants fused to Cre were evalu- 
ated by Western blot analysis (Fig. SI) as well as by 
measuring the recombination efficiency upon transforma- 
tion (Fig. S2). Recombination efficiency was determined 
by introducing the same amount of DNAfrom each genetic 
construct investigated into the reporter strain, and calcu- 
lating the number of chloramphenicol-resistant colonies 
(recombinants) divided by the number of ampicillin- 
resistant colonies (transformants). The results indicated 
that Cre-Trali-992 could catalyse recombination with - 80% 
the efficiency of Cre alone, and the rest of the fusion 
proteins showed recombination rates between ~ 80% and 
~ 120% of wild type (Fig. S2). The CRAfT assay results are 
presented in Fig. 4 and mapped onto the structure of TSA 
in Fig. 3. 

Five mutations resulted in significant loss of transloca- 
tion activity: A593V, D714N, Q736N, S739A and S739N, 
indicating that four residues, 593, 714, 736 and 739 play 
an important role as recognition surfaces for substrate 
export. In addition, mutation of residue 746 to Ala or Asn 
resulted in significant increase in substrate transport, also 
suggesting a role for this residue in translocation. H626 
also appears to be involved since mutation to its equiva- 
lent residue in F Tral does reduce translocation efficiency, 
albeit to a lesser extent than the residues mentioned 
previously. Finally, mutation R717toAsn or 757 to Thr has 
no effect. Thus, these results clearly define a surface on 
TSA that is responsible for substrate recognition and 
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Fig. 4. Translocation assay of TSA variants. Relative translocation 
frequencies (recombinants per donor cell) of the indicated variants 
(left) were compared witti wild-type Cre-Trali_992 from plasmid R1 
(grey bars). Values (expressed as percentage of wild type) 
represent the mean of at least three independent experiments. 
Standard deviations are shown (*P<0.05; **P<0.01; 
***P< 0.001). 



transport (shown in the oval in Fig. 3B). Since direct 
binding between Tral and TraD has never been observed 
(indicating either that the interaction is dominated by a 
rapid koft or that a third partner might be involved either by 
providing additional binding energy or by inducing confor- 
mational changes in one of the binding partners resulting 
in additional surfaces being involved in binding), we could 
not confirm whether any of these mutations either reduces 
or enhances interactions between the two proteins. 

It was previously observed that infection of cells harbour- 
ing the R1 plasmid by the R17 phage can be achieved in 
the presence of a minimal fragment Trali-992 comprising the 
relaxase catalytic activity, the relaxase-associated ssDNA 
binding site and TSA. Indeed, this protein fragment 
together with the DNA is sufficient to activate the T4SS 
channel for phage entry (Lang et al., 2011). This raises the 
question as to whether the proposed recognition interface 
on TSA is necessary for R1 7 phage infection and in turn for 
the activation of the T4SS channel. In order to test this 
hypothesis all the mutants that have been shown to par- 
ticipate in substrate recognition have been tested in a 
phage assay. Interestingly, all of them were able to support 
phage infection at the same level as the wild-type protein 
(Fig. S3). This indicates that although mutation of these 
residues in the Trali_992 impair efficient translocation of the 
protein fragment, they were not sufficient to impair the 
function of the Tral activation domain in nucleoprotein 
import. This finding will need to be further investigated but 
indicates that a wider range of protein-protein and/or 
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protein-DNA interactions are involved in the activation 
process. 

Discussion 

Relaxases are important proteins involved in DNA trans- 
port by T4SS. These proteins form covalent nucleo-protein 
complexes called relaxosomes which serve as substrate 
for T4SS-mediated transport. Previous studies have 
shown that two regions of the Tral relaxase encoded by the 
R1 plasmid act as recognition signals, termed TSA and 
TSB. This is unusual: typically, transport signal sequences 
are short (5-25 residues) sequences located at either 
the N- or C-terminal end of translocated substrates. For 
example, translocation signals for the A. tumefaciensJAS 
system or for the Dot/lcm T4S system from Legionella 
pneumophila are small clusters of positively charged or 
hydrophobic residues located at the C-terminus of the 
translocated substrates (Nagai etal., 2005; Vergunst 
etal., 2005). Translocation signals can however be more 
complex: for example, the Bartonella effector proteins 
have bipartite translocation signals where a charged 
C-terminal residues region is combined with one or more 
copies of a second motif located within the so-called 'Bep 
intracellular delivery (BID) domain' inside the protein 
sequence (Schulein et al., 2005). In Tral, TSA and TSB are 
much longer sequences and they are located within the 
protein sequence, not at the ends. Both function indepen- 
dently from each other. 

Because signal sequences are usually short, of unusual 
amino acid composition, and generally located at either 
the N- or C-terminal ends of translocated substrates, they 
are believed to be unstructured. IVIoreover, attempts at 
structurally characterizing internal translocation signal 
regions of secreted effectors have been inconclusive. For 
example, Voulhoux etal. (2000) identified a domain within 
exotoxin A, a type II secretion system effector, which 
appears to act as a translocation signal; however, the 
functional characterization of this domain was based on 
removing entire secondary structures (the domain con- 
tains six helices) and the only helix (aF) deletion that 
resulted in significantly reduced translocation is a crucial 
structural secondary structure, the removal of which was 
bound to drastically perturb the entire structure. A second 
example is HasA, a type I secretion system effector (IVlasi 
and Wandersman, 2010), where residues within the HasA 
sequence were shown to affect translocation of the sub- 
strate but did not cluster within any particular region of the 
structure, indicating that the effects on translocation were 
likely conformational, rather than impairing the targeting 
of HasA to the secretion system. 

Thus, with the work presented here, we demonstrate for 
the first time that translocation signal sequences can (i) 
fold into defined, well-characterized structures, and (ii) be 



integrated structurally within the more extended scaffold 
of proteins of well-known functions. 

With its two terminal ends located near one another 
and interacting in a two-stranded beta-sheet, TSA indeed 
forms an independent folding unit, i.e. a self-contained 
structural domain. Its three subdomains, 2A, 2B and 
2B-like, are formed by sequence inserted within one 
another, 2B within 2A and 2B-like within 2B. There is no 
precedent for such an observation. Yet, folded TSs might 
be more widespread that originally believed: for example, 
the BID region of Bep proteins might also form folded 
domains, although it cannot be excluded that the actual 
sequences responsible for translocation by BID regions 
might be located in disordered parts. 

Remarkably, we show here that TSA is part of a more 
extensive helicase structure. Indeed, the closest homo- 
logue to TSA is domain 2 of canonical helicases. TSA is 
known to reside in a larger ssDNA-binding domain, the 
boundaries of which have recently been defined, extend- 
ing from residues 381 and 858 (Cheng etal., 2011). The 
structure of the N-terminal part of this domain from 
residue 381 to 569 has been solved and shown to be 
similar to subdomain lAof SF1B helicases (Wright etal., 
2012). Thus, the ssDNA-binding region of Tral contains 
three of the four canonical SF1B family subdomains, 1A, 
2A and 2B. The parts of this domain, which have now 
been structurally characterized, constitute 70% of the 
entire sequence of the domain. Thus, it can be safely 
concluded that the ssDNA-binding domain is a vestigial 
helicase structure, which has lost its helicase activity but 
retained its ssDNA-binding capability. 

Finally, the TSA structure demonstrates that transloca- 
tion signals can be part of defined, well-characterized and 
larger structures supporting completely different function. 
This is also an unprecedented observation. Translocation 
signals are usually self-contained functional entity with no 
associated function other than mediating specific sub- 
strate recruitment to cognate transporters. Here we dem- 
onstrate that TSA is part of a vestigial helicase structure. 
Interestingly, TSB is also part of a helicase domain, but 
this time, the domain has retained helicase activity. These 
structural and functional features are clearly conserved 
among relaxases. Whether they will turn out to be found in 
other proteins remains to be demonstrated; however, TSA 
provides a template that might well prove of general use 
by transporters and secretion systems for interactions 
with their cognate substrates. 

Experimental procedures 

Strains, plasmids and primers 

All E. CO// strains used in this study are described in Table 82. 
Plasmids and Primers are described in Tables S3 and S4 
respectively. 
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Antibiotics, enzymes and reagents 

Antibiotics were added at the indicated concentrations: 
ampicillin (100 ml"'), chloramphenicol (10 |xg mh'), kana- 
mycin (40 mT'), spectinomycin (75 |^g mM), tetracycline 
(8 HQ mM). 

DNA preparation and PCR amplification 

Plasmid DNA was purified from E. co// cells with the QIAprep 
Spin Miniprep Kit (Qiagen, Hilden, Germany). Restriction 
endonucleases and Antarctic phosphatase were purchased 
from New England Biolabs, Beverly, MA, USA). T4 DNA 
ligase was purchased from Fermentas GmBH (St. Leon-Rot, 
Germany). DNA fragments for cloning were amplified using 
Phusion High-Fidelity DNA Polymerase (Finnzymes Oy, 
Espoo, Finland). Enzymes were used according to manufac- 
turers' recommendations. 



Construction of the TSA overexpression piasmid 

Expression construct of TSA (pCDF_TSA) with an N-terminal 
6xHis tag and enterokinase cleavage site was generated by 
inserting the PCR-amplified DNA from R1-16 between the 
Hindlll-BamHI sites In the pCDFIb vector. 

Construction of Cre-fusion piasmids 

CreTralN992 was constructed by ligating the Sail fragment 
from GreTral(309-992) with Sail restricted CreTralAC1227. 
The insert for CreTralN992 F was amplified with primers 
TraLSFWI and TraLSRevS from pOX38, cut with Kpnl and 
llgated with CFB B. Two-step PCR was used to generate 
CreTralN992 derivatives named for the corresponding point- 
mutations in TSA. In the first step primer sets 1 (TralSe- 
qFW2 + Forward-Primer with the desired mutation) and 2 
(TraLSRevS + Reverse-Primer with the desired mutation) 
were used to amplify two fragments from R1 -1 6. In the second 
step these two fragments were annealed and amplified with 
primer set 3 (TralSeqFW2 + TraLSRevS). The fragments 
were cut with Sail and rellgated with CreTralAC1227. 

Construction of expression piasmids 

pGZTralN992 derivatives were constructed in two steps: a 
I.Skb EcoRI/Sall fragment encoding the first 507 amino 
acids of Tral was isolated from pCG02 and llgated in 
pGZ119EH (LessI etal., 1992) to generate pGZpartTral. In 
step two the tral coding region for amino acids 507-992 was 
reconstructed by ligating the Sail fragment of desired 
CreTralN992 mutants in pGZpartTral linearized with Sail. 

Protein purification 

For overexpression of R1 TraLS30-81 6, a 1 I culture of E. coli 
BL21(DE3) star carrying pCDF_TSAwas grown at 37°C in LB 
medium containing 75 [ig mL' spectinomycin to an Aem of 0.6, 
when IPTG was added to 1 mM final concentration. After an 
additional overnight incubation at 16°C, the cells were har- 



vested by centrifugation and the pellets frozen at -80°C. Cells 
were thawed overnight at 4°C, resuspended in 1 5 ml of buffer 
I (50 mM sodium phosphate, 250 mM sodium chloride, 1 0 mM 
Imidazole, 0.02% sodium azide, pH = 7.5), and lysed by two 
passages through Emulsiflex-OS (AVESTIN). The soluble frac- 
tion was obtained by ultracentrifugation at 21 000 g for 1 h. The 
supernatant was filtered through a 0.45 [im filter prior to 
loading to a HisTrap column (GE Healthcare) equilibrated in 
buffer I. After washing the column with buffer I and 10% of 
buffer II (buffer I containing 500 mM Imidazole), the protein 
was eluted by applying a 50 ml gradient to 60% buffer II. 
Fractions containing protein were adjusted with 3 M ammo- 
nium sulphate to 1 M final concentration, combined and 
loaded onto a HiTrap Phenyl HP column equilibrated with 
buffer III (50 mM sodium phosphate, 100 mM sodium chloride, 
1 mM EDTA pH = 8.0, 1 M ammonium sulphate, pH = 7.5). 
After washing the column with three column volumes of 50% 
buffer IV (50 mM sodium phosphate, 100 mM sodium chlo- 
ride, 1 mM EDTA pH = 8.0, pH = 7.5), TSAAC and full-length 
TSA were eluted sequentially by applying a 50-1 00% buffer IV 
gradient. Fractions containing protein were combined and 
concentrated with Amicon filter devices (Milipore) and the 
protein was injected to a 120 ml Sephacryl 200 HR column 
(GE Healthcare), equilibrated with buffer V (10 mM Tris 
pH = 7.5, 1 50 mM magnesium chloride). The protein eluted as 
a single peak. Fractions containing protein were concentrated 
to 1 5 mg mL' and either Immediately used for crystallization or 
adjusted with glycerol to 40% (v/v) and stored at -80°C. The 
apparent molecular mass of the full-length protein of 33.7 kDa 
was confirmed by denaturating polyacrylamide gel electropho- 
resis and Coomassie blue staining. The N-terminal five resi- 
dues (IISEPD) of the TSAAcryst were assessed by Edman 
degradation analysis (PNAC Facility, University of Cam- 
bridge). SeMet-containing TSAAcryst was produced as recom- 
mended by Molecular Dimension, and was purified using a 
similar protocol as for the wild-type protein. 



TSAAcryst crystallization 

TSAAcryst was crystallized using the hanging drop vapour dif- 
fusion method (McCoy et al., 2007) using a solution of 0.1 M 
HEPES 7.5, 0.2 M sodium sulphate, 25% PEG 3350 as res- 
ervoir at a temperature of 16°C. Crystals appear after 3 days. 
TSAAcryst crystallized in space group P41212, with unit cell 
dimensions of a = b = 109.17 A, c = 56.80 A, and diffracted to 
a resolution of 1 .85 A. 



Structure determination 

The SeMet-containing protein crystals of TSAAcryst was used 
for phasing. The data sets of the SeMet-TSA and native-TSA 
were collected at beamline ID14-2 at the European Synchro- 
tron Radiation Facility (ESRF). A data set was collected at the 
wavelength of 0.8726 A, and processed using the XDS suite. 
The unmerged HKL file was converted into an mtz file using 
POINTLESS. REINDEX and SCALA (Winn etal., 2011) were 
used for scaling and separating the anomalous pairs. The 
unmerged intensities were then converted into merged ampli- 
tudes using C-TRUNCATE (Winn etal., 2011). Heavy atoms 
search and SAD phases calculations were carried out using 
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PHASER (McCoy etal., 2007). Phases were further 
improved by solvent flattening using DM (Winn etal., 2011). 
The solvent flattened phases were used for model building 
using BUCCANEER (Winn etal., 2011). Building of the initial 
model was completed manually using COOT. Restrained 
refinement and one round of TLS refinement was performed 
using REFMAC (Winn etal., 2011). The model was manually 
inspected and corrected in COOT using 2Fo-Fc maps 
(Emsley and Cowtan, 2004). The final model was validated 
using PROCHECK (Winn etal., 2011). 

Western blot analysis 

For each sample a total of 2 ml mid-log-phase cells was 
pelleted and resuspended in 80 mM Tris-HCI (pH = 6.8), 6% 
DTT, 6% SDS, 12% glycerol and 0.04% bromophenol blue. 
Samples were heated to 96°C for lOmin and centrifuged 
afterwards. The equivalent of 0.1 Asm units was loaded on a 
1 2.5% SDS-PAGE and run for 90 min at 1 0 mA. Transfer was 
done overnight at 4°C at 90 mAonto a milllipore immobilon-P 
membrane. Membranes were dried for 2 h at RT and soaked 
briefly in 100% methanol before blocking overnight at 4°C in 
1x TBS (20 mM Tris-HCI pH = 7.5, 150 mM NaCI) with 3% 
BSA (albumin V). After washing two times for 5 min with TBS, 
primary anti-Cre rabbit polyclonal antibody (Novagen, 
69050-3) was used in TBS with 3% BSA (1:10 000 dilution) 
for 2 h at RT. After washing [2x10 min with TBS, 1 x 10 min 
with TBS -I- 0.1 % Tween20 (TBST)] membrane was incubated 
for 1 h at RT with peroxidase-conjugated anti-rabbit IgG 
(Sigma, A0545). After 4x10 min washing with TBST, detec- 
tion was done with an ECL-Kit (GE Healthcare). 

CRAfT (Cre recombinase assay for translocation) 

The Cre fusion reporter assay was performed as described 
previously (Lang etal., 2010). E. coli MS411 carrying the 
plasmids of interest and recipient CSH26Cm::LTL were used. 
Donors were selected on plates containing appropriate anti- 
biotics (see Table S2). Transconjugants and recombinants 
were identified by plating serial dilutions on plates containing 
kanamycin (40 |xg ml"') and X-Gal (100 |xg ml"') or chloram- 
phenicol (10|xgmr') respectively. Conjugation and protein 
translocation frequencies are calculated as transconjugants 
or recombinants per donor respectively. 

Infection studies witli the male-specific ptiage R17 

Fresh phage lysate was prepared as described previously 
(Lang etal., 2011). Liquid infection assays as described in 
Lang et al. (201 1 ) were modified for 24-well cell culture plates 
(Greiner Bio-one). Briefly, 900 ^1 LB medium containing 
2 mM CaCl2 and the appropriate antibiotics were inoculated 
to Aeoo 0.02 with the desired strain. R17 phage lysate (100 |j^l) 
was added to give a multiplicity of infection of 10. Cultures 
were grown at 37°C with shaking and cell lysis was deter- 
mined by measuring the Aeoo from 0 to 240 min post infection 
using a xMark™ Microplate Spectrophotometer (Bio-Rad). 
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